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(54) Prediction and coding of bi-directionally predicted video object planes for interlaced digital 
video 



(57) A system for coding of digital video images 
such as bi-directionally predicted video object planes 
(B-VOPs) (420), in particular, where the B-VOP and/or a 
reference image (400,440) used to code the B-VOP is 
interlaced coded. For a B-VOP macroblock (420) which 
is co-sited with a field predicted macroblock of a future 
anchor picture (440), direct mode prediction is made by 
calculating four field motion vectors (MV ( top , MV f (bo t. 
MV b,top- MV b,bot)- then generating the prediction mac- 
roblock. The four field motion vectors and their refer- 
ence fields are determined from (1) an offset term 
(MV D )of the current macroblock's coding vector, (2) the 
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two future anchor picture field motion vectors (MV top , 
M v bot). (3) the reference field (405,41 0) used by the two 
field motion vectors of the co-sited future anchor mac- 
roblock, and (4) the temporal spacing (TRt, top , TR b ^ 
TR D(top . TR Dbot ), in field periods, between the current 
B-VOP fields and the anchor fields. Additionally, a cod- 
ing mode decision process for the current MB selects a 
forward, backward, or average field coding mode 
according to a minimum sum of absolute differences 
(SAD) error which is obtained over the top (430) and 
bottom (425) fields of the current MB (420). 
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VOP is interlaced coded. cuiar, wnere the B-VOP and/or a reference image used to code the B- 

' entitled "MPEG-4 Video Verification MciTvSTon 8 0" IfSE? '^<^1«C29rWGi 1 IM1796. 

MPEG-2 standard is a precursor to the MPE^ sS^daS^ " IT- ,nco, P orated he rein by reference. The 
"information Technology - Generic CaJS^^SS^jSS^ IZ"?"* ' S ° /,EC 13818 " 2 ' entit,ed 
25. 1994. incorporated herein by reference Associated Aud,o. Recommendation H.262." March 

tioaa^s^ 

framework of MPEG-4 supports various cortSt i^ 3 ^ ° f ,eatures ^ 

cations required by the computer. telecom™*^ 

base browsing, information retrieval. ^^Z coZZ^^ ™ Such "S, 

^tainm^^ and ovulation o, 

scalability, and error resilience achieves eff cent compress.on. object scalability, spatial and temporal 

.ion.?^^ 

sation. Object shapes are represented as atoha ma^ aS tSSST ( ^ n ™ 9 ° Veriapped "ock-motion compen- 
algorithm or a modified DcTcode, bo? S 2^J52£ u « n 9^em-based Arithmetic Encoding ^AE) 
^™rgra^ics.0.er^ 

a Kvo-dimensiona. (2-D, spatial ^r^T^ ""*•"■"■«■ (ME/MC) and 
tageoftemporalandspatial correlations inavideo sequel 

and entropy coding under a complexity constraint The most commnn f ^ e ^ ls * or * ,on performance of quantization 

and the most common spatial transitions been Z OCT ^ ME/M ° has been Wock etching. 

* -VOPs when the MB is M indeed 

a B-VOP. It would further be desirable to h^eTcSino m!l! f ^ d ' reCt m0de ^'"S of a coded MB in 
selecting the reference image which i^JZlTZ?^^ 0 ^ ** a ^ ' n a «- coded B-VOP for 

The present invention provides a system having the above and othe? advantages. 
SUMMARY OF THE INVENTION 

such'rr;^ 

where the current image and/or a ref^ln^eul^^Zf ^ ^ ^ ^ (B - VOP >' in P**i 

In a first aspect of the invention, a n*J2^££££ ^T* ,S inter,aced (*»• *M> coded, 
predicted. fi eld coded image such as a rna^cT^ for a current bi^irectionally 

•mages. A past field coded reference image havTSaL^^t^^J^ * 3 Sequence of di 9 ital 
•ng top and bottom fields are determined. The fuSaae ' «!! **** ^ ° 0dad reference ima 9 e hav- 

of as being opposite the direction of the corre^Zg SJ nem0n,C ' *" prediction ™V be thought 

current image by scaling the forward MV of the ooo^SjSS SeSre ^ ^ ^ ° f 
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In particular, MV f top , the forward motion vector for predicting the top field of the current image, is determined 
according to the expression MV f top = (MV top *TR B top )/TR D !op + MV D , where MV D is a delta motion vector for a 
search area, TR Btop corresponds to a temporal spacing between the top field of the current image and the field of the 
past image which is referenced by MV top , and TR D top corresponds to a temporal spacing between the top field of the 
future image and the field of the past image which is referenced by MV top . The temporal spacing may be related to a 
frame rate at which the images are displayed. 

Similarly, MV f|bot , the forward motion vector for predicting the bottom field of the current image, is determined 
according to the expression MV tibot = (MV^ *TR B bot )/TR D ^ + MV D , where MV D is a delta motion vector, TR B bot 
corresponds to a temporal spacing between the bottom field of the current image and the field of the past image which 
is referenced by MV^, and TR D ^ corresponds to a temporal spacing between the bottom field of the future MB and 
the field of the past MB which is referenced by MV^,. 

MV b top , the backward motion vector for predicting the top field of the current MB is determined according to the 
equation M V b top = ((TR B top -TR D top )*MV top )/TR D top when the delta motion vector MV D =0, or 
MV b top = MV f top - MV top when MV D *0. 

MV b bot, the backward motion vector for predicting the bottom field of the current MB is determined according to the 
equation MV t> bot = (C^R b bot"^ d bot)*^V botV^ d bot when the delta motion vector MV D =0, or 
MV b bot = MV f bot - MV^ when MV D *0. 

A corresponding decoder is also presented. 

In another aspect of the invention, a method is presented for selecting a coding mode for a current predicted, field 
coded MB having top and bottom fields, in a sequence of digital video MBs. The coding mode may be a backward 
mode, where the reference MB is temporally after the current MB in display order, a forward mode, where the reference 
MB is before the current MB, or average (e.g., bi-directional) mode, where an average of prior and subsequent refer- 
ence MBs is used. 

The method includes the step of determining a forward sum of absolute differences error, SAD forward f tew for the cur- 
rent MB relative to a past reference MB, which corresponds to a forward coding mode. SAD forward f i9ld indicates the error 
in pixel luminance values between the current MB and a best match MB in the past reference MB. A backward sum of 
absolute differences error, SAD backward field for the current MB relative to a future reference MB, which corresponds to 
a backward coding mode is also determined. SAD^^a^ flo)d indicates the error in pixel luminance values between the 
current MB and a best match MB in the future reference MB. 

An average sum of absolute differences error, SAD averagd ,fieid for the current MB relative to an average of the past 
and future reference MBs, which corresponds to an average coding mode, is also determined. SAD average f jQjd indicates 
the error in pixel luminance values between the current MB and a MB which is the average of the best match MBs of 
the past and future reference MBs. 

The coding mode is selected according to the minimum of the SADs. Bias terms which account for the number of 
required MVs of the respective coding modes may also be factored into the coding mode selection process. 

SAD f0fWafd fiGld , SADtod^art fjeH, and SAD average f are determined by summing the component terms over the 
top and bottom fields. 

BRIEF DESCRIPTION OF THE DRAWINGS 

FIG. 1 is an illustration of a video object plane (VOP) coding and decoding process in accordance with the present 
invention. 

FIG. 2 is a block diagram of an encoder in accordance with the present invention. 
FIG. 3 illustrates an interpolation scheme for a half-pixel search. 

FIG. 4 illustrates direct mode coding of the top field of an interlaced-coded B-VOP in accordance with the present 
invention. 

FIG. 5 illustrates direct mode coding of the bottom field of an interlaced-coded B-VOP in accordance with the 
present invention. 

FIG. 6 illustrates reordering of pixel lines in an adaptive frame/field prediction scheme in accordance with the 
present invention. 

FIG. 7 is a block diagram of a decoder in accordance with the present invention. 

FIG. 8 illustrates a macroblock layer structure in accordance with the present invention. 

DETAILED DESCRIPTION OF THE INVENTION 

A method and apparatus are presented for coding of a digital video image such as a macroblock (MB) in a bi-direc- 
tionally predicted video object plane (B-VOP), in particular, where the MB and/or a reference image used to code the 
MB is interlaced coded. The scheme provides a method for selecting a prediction motion vector (PMV) for the top and 
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oblong foreground element 108. and VOP ' uZ^^S^ 0 "*™ 107 " V0P U * W«TS 

arbrtrary shape, and a succession of VOPs Zl? I ,a "f sca P e element 109. A VOP^n have an 

10 sidered to be a VOP. Thus, the te^VoPwil^ hJIl * fU " reCten9U,ar video *V So be co n 
gular) image area shapes. A segmentation ^£SLET * ""Jf* 6 bolh and "on-arbitrary (e " ££J 

of .TU-R 601 luminance data. 2ch p!Su ^1?E^^ ti ? i ^ and has a *~ -£*r ££a, 
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mof,on estimate within a frame. With 1Mur9 codlrralDasrZSl^ 9 ' ! ^ *»*» * ceded using 
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^^da^r^^ 
and 1 19 and roe tocally stored VOP 178 WP 1 1 7. The frame 185 thus influx ^ reeekSvopTlTe 

» oround. such as a news studio. The usTnt/sZT ?SSZ^f aVOP «* i *issepara t a 1 ™m l heS- 
program. such as a channel with slock prirWw^tnST^""" fte > ™ « from another »Z 
ott^ ,hav ,70 ™» «■» "^SSrS 2"^™" "I 9 " 1 " 8 «-<"*» erC^ 
ST * ™ - — «-= ^ a^^co^TaXa 

SS S^^^^^^ZSZ-* — . ^ness a* educate, 

MB relative to (he best match MB. Additionally an aoVanced weA^If i^IIli ^ crl ^ es 'ne displacement of the current 
compensation is performed on 8x8 Mocks rathe, ZvSfu^ 0 " *" P ' V0Ps te «•»» motto 
V0P B «» «» <»ded in a Irame mode cTa nekTIct ^ f8 °*'' "* M i"*""*™ ccZ£ 
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MB is compared to a search area of MBs in both a temporally previous anchor frame and a temporally subsequent 
anchor frame to determine the best match MBs. Forward and backward MVs describe the displacement of the current 
MB relative to the best match MBs. Additionally, an averaged image is obtained from the best match MBs for use in 
encoding the current MB. 

5 With direct mode prediction of B-VOPs. a MV is derived for an 8x8 block when the collocated MB in the following 

P-VOP uses the 8x8 advanced prediction mode. The MV of the 8x8 block in the P-VOP is linearly scaled to derive a MV 
for the block in the B-VOP without the need for searching to find a best match block. 

The encoder, shown generally at 200, includes a shape coder 210, a motion estimation function 220, a motion com- 
pensation function 230, and a texture coder 240, which each receive video pixel data input at terminal 205. The motion 
w estimation function 220, motion compensation function 230, texture coder 240, and shape coder 210 also receive VOP 
shape information input at terminal 207, such as the MPEG-4 parameter VOP_of_arbitrary_shape. When this parame- 
ter is zero, the VOP has a rectangular shape, and the shape coder 210 therefore is not used. 

A reconstructed anchor VOP function 250 provides a reconstructed anchor VOP for use by the motion estimation 
function 220 and motion compensation function 230. A current VOP is subtracted from a motion compensated previous 
75 VOP at subtracter 260 to provide a residue which is encoded at the texture coder 240. The texture coder 240 performs 
the DCT to provide texture information (e.g.. transform coefficients) to a multiplexer (MUX) 280. The texture coder 240 
also provides information which is summed with the output from the motion compensator 230 at a summer 270 for input 
to the previous reconstructed VOP function 250. 

Motion information (e.g., motion vectors) is provided from the motion estimation function 220 to the MUX 280, while 
20 shape information which indicates the shape of the VOP is provided from the shape coding function 210 to the MUX 
280. The MUX 280 provides a corresponding multiplexed data stream to a buffer 290 for subsequent communication 
over a data channel. 

The pixel data which is input to the encoder may have a YUV 4:2:0 format. The VOP is represented by means of a 
bounding rectangle. The top left coordinate of the bounding rectangle is rounded to the nearest even number not 

25 greater than the top left coordinates of the tightest rectangle. Accordingly, the top left coordinate of the bounding rec- 
tangle in the chrominance component is one-half that of the luminance component. 

FIG. 3 illustrates an interpolation scheme for a half-pixel search. Motion estimation and motion compensation 
(ME/MC) generally involve matching a block of a current video frame (e.g., a current block) with a block in a search area 
of a reference frame (e.g., a predicted block or reference block). For predictive (P) coded images, the reference block 

30 is in a previous frame. For bi-directionally predicted (B) coded images, predicted blocks in previous and subsequent 
frames may be used. The displacement of the predicted block relative to the current block is the motion vector (MV), 
which has horizontal (x) and vertical (y) components. Positive values of the MV components indicate that the predicted 
block is to the right of, and below, the current block. 

A motion compensated difference block is formed by subtracting the pixel values of the predicted block from those 

35 of the current block point by point. Texture coding is then performed on the difference block. The coded MV and the 
coded texture information of the difference block are transmitted to the decoder. The decoder can then reconstruct an 
approximated current block by adding the quantized difference block to the predicted block according to the MV. The 
block for ME/MC can be a 16x16 frame block (macroblock), an 8x8 block or a 16x8 field block. 

Accuracy of the MV is set at half-pixel. Interpolation must be used on the anchor frame so that p(i+x,j+y) is defined 

40 for x or y being half of an integer. Interpolation is performed as shown in FIG. 3. Integer pixel positions are represented 
by the symbol "+*\ as shown at A, B, C and D. Half-pixel positions are indicated by circles, as shown at a, b. c and d. As 
seen, a = A , b = (A + B)//2 c = (A + C)//2 , and d = (A+B + C + D)//4 , where 7/" denotes rounded division. Further 
details of the interpolation are discussed in MPEG-4 VM 8.0 referred to previously as well as commonly assigned U.S. 
Patent application Serial No. 08/897,847 to Eifrig et al., filed July 21 , 1 997, entitled "Motion Estimation and Compensa- 

45 tion of Video Object Planes for Interlaced Digital Video", incorporated herein by reference. 

FIG. 6 illustrates reordering of pixel lines in an adaptive frame/field prediction scheme in accordance with the 
present invention. In a first aspect of the advanced prediction technique, an adaptive technique is used to decide 
whether a current macroblock (MB) of 16x16 pixels should be ME/MC coded as is, or divided into four blocks of 8x8 
pixels each, where each 8x8 block is ME/MC coded separately, or whether field based motion estimation should be 

so used, where pixel lines of the MB are reordered to group the same-field lines in two 16x8 field blocks, and each 16x8 
block is separately ME/MC coded. 

A field mode 16x16 macroblock (MB), is shown generally at 600. The MB includes even-numbered lines 602, 604, 
606. 608, 610, 612, 614 and 616, and odd-numbered lines 603. 605. 607, 609. 61 1, 613, 615 and 617. The even and 
odd lines are thus interleaved, and form top and bottom (or first and second) fields, respectively. 

55 When the pixel lines in image 600 are permuted to form same-field luminance blocks, the MB shown generally at 
650 is formed. Arrows, shown generally at 645, indicate the reordering of the lines 602-617. For example, the even line 
602, which is the first line of MB 600, is also the first line of MB 650. The even line 604 is reordered as the second line 
in MB 650. Similarly, the even lines 606, 608, 610, 612, 614 and 616 are reordered as the third through eighth lines, 
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"unidirectional" prediction. The predicted blocks of the B-VOP are determined differently for each mode. Furthermore, 
blocks of a B-VOP and the anchor block(s) may be progressive (e.g., frame) coded or interlaced (e.g., field) coded. 

A single B-VOP can have different MBs which are predicted with different modes. The term "B-VOP" only indicates 
that bi-directionally predicted blocks may be included, but this is not required. In contrast, with P-VOPs and l-VOPs. bi- 

5 directionally predicted MBs are not used. 

For non-direct mode B-VOP MBs, MVs are coded differentially. For forward MVs in forward and bidirectional 
modes, and backward MVs in backward and bi-directional modes, the "same-type" MV (e.g., forward or backward) of 
the MB which immediately precedes the current MB in the same row is used as a predictor. This is the same as the 
immediately preceding MB in raster order, and generally, in transmission order. However, if the raster order differs from 

io the transmission order, the MVs of the immediately preceding MB in transmission order should be used to avoid the 
need to store and re-order the MBs and corresponding MVs at the decoder. 

Using the same-type MV, and assuming the transmission order is the same as the raster order, and that the raster 
order is from left to right, top to bottom, the forward M V of the left-neighboring MB is used as a predictor for the forward 
M V of the current MB of the B-VOP. Similarly, the backward MV of the left-neighboring MB is used as a predictor for the 

is backward MV of the current MB of the B-VOP The MVs of the current MB are then differentially encoded using the pre- 
dictors. That is, the difference between the predictor and the MV which is determined for the current MB is transmitted 
as a motion vector difference to a decoder. At the decoder, the MV of the current MB is determined by recovering and 
adding the PMV and the difference MV 

In case the current MB is located on the left edge of the VOP, the predictor for the current MB is set to zero. 

20 For interlaced-coded B-VOPs, each of the top and bottom fields have two associated prediction motion vectors, for 
a total of four MVs. The four prediction MVs represent, in transmission order, the top field forward and bottom field for- 
ward of the previous anchor MB, and the top field backward and bottom field backward of the next anchor MB. The cur- 
rent MB and the forward MB, and/or the current MB and the backward MB, may be separated by one or more 
intermediate images which are not used for ME/MC coding of the current MB. B-VOPs do not contain INTRA coded 

25 MBs, so each MB in the B-VOP will be ME/MC coded. The forward and backward anchor MBs may be from a P-VOP 
or l-VOP, and may be frame or field coded. 

For interlaced, non-direct mode B-VOP MBs, four possible prediction motion vectors (PMVs) are shown in Table 2 
below. The first column of Table 2 shows the prediction function, while the second column shows a designator for the 
PMV. These PMVs are used as shown in Table 3 below for the different MB prediction modes. 

30 



Table 2 



Prediction function 


PMV type 


Top field, forward 


0 


Bottom field, forward 


1 


Top field, backward 


2 


Bottom field, backward 


3 



Table 3 



Macroblock mode 


PMV type used 


Frame, forward 


0 


Frame, backward 


2 


Frame, bi-directional 


0,2 


Field, forward 


0.1 


Field, backward 


2.3 


Field, bi-directional 


0,1,2,3 



For example, Table 3 shows that, for a current field mode MB with a forward prediction mode (e.g., "Field, forward"), 
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Table 4 

Prediction Motion Vector Index pmvf l 
Current Macroblock type 
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field forward, top field backward, and bottom field backward) are calculated directly from the respective MVs of the cor- 
responding MB of the future anchor picture. 

The technique is efficient since the required searching is significantly reduced, and the amount of transmitted MV 
data is reduced. Once the MVs and reference field are determined, the current MB is considered to be a bi-directional 

5 field predicted MB. Only one delta motion vector (used for both fields) occurs in the bitstream for the field predicted MB. 
The prediction for the top field of the current MB is based on the top field MV of the MB of the future anchor picture 
(which can be a P-VOP, or an l-VOP with MV=0), and a past reference field of a previous anchor picture which is 
selected by the corresponding MV of the top field of the future anchor MB. That is, the top field MB of the future anchor 
picture which is correspondingly positioned (e.g., co-sited) to the current MB has a best match MB in either the top or 

10 bottom field of the past anchor picture. This best match MB is then used as the anchor MB for the top field of the current 
MB. An exhaustive search is used to determine the delta motion vector MV D given the co-sited future anchor MV on a 
MB by MB basis. 

Motion vectors for the bottom field of the current MB are similarly determined using the MV of the correspondingly 
positioned bottom field of the future anchor MB, which in turn references a best match MB in the top or bottom field of 
is the past anchor picture. 

Essentially, the top field motion vector is used to construct an MB predictor which is the average of (a) pixels 
obtained from the top field of the correspondingly positioned future anchor MB and (b) pixels from the past anchor field 
referenced by the top field MV of the correspondingly positioned future anchor MB. Similarly, the bottom field motion 
vector is used to construct a MB predictor which is the average of (a) pixels obtained from the bottom field of the corre- 
ct? spondingly positioned future anchor MB and (b) pixels from the past anchor field referenced by the bottom field MV of 
the correspondingly positioned future anchor MB. 

As shown in FIG. 4, the current B-VOP MB 420 includes a top field 430 and bottom field 425. the past anchor VOP 
MB 400 includes a top field 410 and bottom field 405. and the future anchor VOP MB 440 includes a top field 450 and 
bottom field 445. 

25 The motion vector MV top is the forward motion vector for the top field 450 of the future anchor MB 440 which indi- 
cates the best match MB in the past anchor MB 400. Even though MV top is referencing a previous image (e.g., back- 
ward in time), it is a forward MV since the future anchor VOP 440 is forward in time relative to the past anchor VOP 400. 
In the example, MV top references the bottom field 405 of the past anchor MB 400, although either the top 410 or bottom 
405 field could be referenced. MV f top is the forward MV of the top field of the current MB, and MV btop top is the back- 

30 ward MV of the top field of the current MB. Pixel data is derived for the bi-directionally predicted MB at a decoder by 
averaging the pixel data in the future and past anchor images which are identified by MV b top and MV ftop , respectively, 
and summing the averaged image with a residue which was transmitted. 
The motion vectors for the top field are calculated as follows: 

35 M V ( top =(TR B .top* MV top)™ D.top+ MV D : 

MV b>top = «TR Btop - TR □ top ) * MV top )/TR D , 

if MV D =0; and 

40 

MV btop = (MV f top - MV top ) if MV D *0. 
MV D is a delta, or offset, motion vector. Note that the motion vectors are two-dimensional. 

Additionally, the motion vectors are integral half-pixel luma motion vectors. The slash T denotes truncate toward zero 
45 integer division. Also, the future anchor VOP is always a P-VOP for field direct mode. If the future anchor was an l_VOP. 
the MV would be zero and 16x16 progressive direct mode would be used. TR Btop is the temporal distance in fields 
between the past reference field (e.g.. top or bottom), which is the bottom field 405 in this example, and the top field 430 
of the current B-VOP 420. TR D top is the temporal distance between the past reference field (e.g., top or bottom), which 
is the bottom field 405 in this example, and the future top reference field 450. 
so FIG. 5 illustrates direct mode coding of the bottom field of an interlaced-coded B-VOP in accordance with the 
present invention. Note that the source interlaced video can have a top field first or bottom field first format. A bottom 
field first format is shown in FIGs 4 and 5. Like-numbered elements are the same as in FIG. 4. Here, the motion vector 
MVto, is the forward motion vector for the bottom field 445 of the future anchor macroblock (MB) 440 which indicates 
the best match MB in the past anchor MB 400. In the example, MV bot references the bottom field 405 of the past anchor 
55 MB 400, although either the top 410 or bottom 405 field could be used. MV f ibot and MV bbot are the forward and back- 
ward motion vectors, respectively. 

The motion vectors for the bottom field are calculated in a parallel manner to the top field motion vectors, as follows: 
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if MV D =0; and 
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MV f.bot=(TR B ,bot ™ V ^J/TR Dbol+ M V D ; 
MV b,bot « «™ Bibol - TR D#bQ| ) * MV^ )/ TR D , 

MV b.bot = (MV ltbM - MVJ if MV D *0. 



Regarding the examples of FIGs 4 and 5. the calculation of TR n . tr ' T r ^-ro 

TR o.top °r TR 0 bot = 2'CTR^ - TR^) + 8; and 
TR B.top Of TR B bot = 2 *(TR cufren , - TR past ) + 8; 
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k ^L! ffident COding ' an a PP ro P ri «te ood'ng mode decision process is required As indicated for r v/npe a kau 

.cally. seven biased SAD terms are calculated as follows: P tQ *" deCOded anchor pichjres Spedf- 

(2) SAD to(yratd+ t, 2 , 

(3) SADbaa^+b* 

(4) SAD averag8+ b3, 

(5) SAD folware( fieH+ba, 

(6) SAD bad<W afd.field+fc>3- anc ( 

(7) SAD average fj8|d+ b 4> 
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where the subscripts indicate direct mode, forward motion prediction, backward motion prediction, average (i.e., inter- 
polated or bi-directional) motion prediction, frame mode (i.e., locally progressive) and field mode (i.e., locally inter- 
laced). The field SADs above (i.e., SAD towardt fieki. SAD^c^^^d. and SAD average fte(d ) are the sums of the top and 
bottom field SADs, each with its own reference field and motion vector. Specifically. 

5 SAD (orward jield = forward.top field + SAD (onwafd bottom field ; $AD backward fje | d = S AD backward top f jekJ + 

SAD backward.bonom field » an ^ SAD average fiG | d = SAD average top j je | d + SAD average bottom field • 

SAD direct is the best direct mode prediction, SAD forward is the best 16x16 prediction from the forward (past) refer- 
ence, SAD backward is the best 16x16 prediction from the backward (future) reference. SAD average is the best 16x16 pre- 
diction formed by a pixel-by-pixel average of the best forward and best backward reference, SAD torward f ie | d is the best 
10 field prediction from the forward (past) reference, SAD^^^ f je id is the best field prediction from the backward (future) 
reference, and SAD average( fjQid is the best field prediction formed by a pixel-by-pixel average of the best forward and best 
backward reference. 

The bj's are bias values as defined in Table 6, below, to account for prediction modes which require more motion 
vectors. Direct mode and modes with fewer MVs are favored. 

15 



Table 6 



Mode 


Number of motion vec- 
tors 


bj 


Bias 


Value 


Direct 


1 


bi 


-(Nb/2+1) 


-129 


Frame, forward 


1 


b 2 


0 


0 


Frame, backward 


1 


b 2 


0 


0 


Frame, average 


2 


b 3 


(Nb/4+1) 


65 


Field, forward 


2 


b 3 


(Nb/4 + 1) 


65 


Field, backward 


2 


b 3 


(Nb/4 + 1) 


65 


Field, average 


4 


b 4 


(Nb/2+1) 


129 



The negative bias for direct mode is for consistency with the existing MPEG-4 VM for progressive video, and may result 
in relatively more skipped MBs. 

FIG. 7 is a block diagram of a decoder in accordance with the present invention. The decoder, shown generally at 

35 700, can be used to receive and decode the encoded data signals transmitted from the encoder of FIG. 2. The encoded 
video image data and differentially encoded motion vector (MV) data are received at terminal 740 and provided to a 
demultiplexer (DEMUX) 742. The encoded video image data is typically differentially encoded in DCT transform coeffi- 
cients as a prediction error signal (e.g., residue). 

A shape decoding function 744 processes the data when the VOP has an arbitrary shape to recover shape infor- 

40 mation. which is, in turn, provided to a motion compensation function 750 and a VOP reconstruction function 752. A tex- 
ture decoding function 746 performs an inverse DCT on transform coefficients to recover residue information. For 
INTRA coded macroblocks (MBs). pixel information is recovered directly and provided to the VOP reconstruction func- 
tion 752. 

For INTER coded blocks and MBs. such as those in B-VOPs, the pixel information provided from the texture decod- 
45 ing function 746 to the reconstructed VOP function 752 represents a residue between the current MB and a reference 
image. The reference image may be pixel data from a single anchor MB which is indicated by a forward or backward 
MV. Alternatively, for an interpolated (e.g., averaged) MB, the reference image is an average of pixel data from two ref- 
erence MBs, e.g., one past anchor MB and one future anchor MB. In this case, the decoder must calculate the averaged 
pixel data according to the forward and backward MVs before recovering the current MB pixel data. 
so For INTER coded blocks and MBs, a motion decoding function 748 processes the encoded MV data to recover the 
differential MVs and provide them to the motion compensation function 750 and to a motion vector memory 749. such 
as a RAM. The motion compensation function 750 receives the differential MV data and determines a reference motion 
vector (e.g.. predictor motion vector, or PMV) in accordance with the present invention. The PMV is determined accord- 
ing to the coding mode (e.g., forward, backward, bi-directional, or direct). 
55 Once the motion compensation function 750 determines a full reference M V and sums it with the differential MV of 
the current MB, the full MV of the current MB is available. Accordingly, the motion compensation function 750 can now 
retrieve anchor frame best match data from a VOP memory 754, such as a RAM, calculate an averaged image if 
required, and provide the anchor frame pixel data to the VOP reconstruction function to reconstruct the current MB. 
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5? 10 V ° P 754 40 ^ 2? IT" 35 3 V,de ° ^ ^ and — pro 

bilrty may be required depending on the frame trarTmS! h a " ^ropriate video data buffering caoa- 

s ^ybeatemporally^ 

«, J ! ^ rOw810inc,ude s«eldsfirst_shape code MVD h^? 1? °S and from ,eft »° right within a row. 

ESS? r? MBTYPE - A thi,d row 850 ^BBpTbouANT ^ A ~f second row 830 indudes ,iel * 

MBTYPE. which also signals ^on ^Z^Z^Z ^^ ^J? 00 * MaCr0b,0Ck <*> e * "5 
can be up to four MVs per MB. MBTYPE irScaSs »e £^naST (DQUANT > interlaced mode, therl 
CBPB ,s the Coded Block Pattern for a B-type ^croWocf CBP^J ' 9 "?™ ar<i - bi-directiona. or dir«St 

maximum of four bits. DQUANTdefines cha^esTn £ *v2ue ^a ouaS * * °* PB *- * 

rtonmfcn in aiMWn, „. Mv „. . ^SS^J M"" ?" enl MB <» "** H» fecoo* uses 

mvT informati0n iS inC,Uded in *• Wsfream ^ ( **' MB in *• ^-VOP is also skipped 

VOP). ,t ^ E^^^^^i- — ra„ y previous reference VOP ( an I- or a P- 
the vertical component. For an in^acedt^^^ * 3 ""«-*» "-*h 'codeword for 

MVD, represents a pair of field motion vectors (top tUUlESSSS^r l^T™* ° f fan ""» tf or l **P«**. 

M VD b is the motion vector of a MB in B- VOP with reso^to tS^T 1?° r6ference tne anchoVvOP 
It consists of a variable length codeword fhr thl h~ ff^ temporally following reference VOP (an I- or a P vopi 
vertical component. For an Se^cS^^ .ength^d'X 
Mvnp eSent f 8 "* ° f fieW MVs «-d follow^ iJbSSiL'E MBTYPE of backward or interpolate. 
MVDB is only present in B-VOPs if direct mode is ind^!^ k reference the future anchor VOP. 

length codeword for the horizontal component foTowi S a v^n^, 1^ MBTYPE ' and *"*«■ <* a variable 
each vector MVDBs represents delta vectors that are^ used to correct Vvr^M^ 6 * 0 ^ for the vertical <»«V°™ * 
scaling P-VOP MB motion vectors. l ° 00nect B ' VOP MB mo «°n vectors which are obtained by 

CODA refers to gray scale shape coding 

coded MB is presented, in addition to a ceding™^™ ^'f 1 ^ Aschem e^ direct coding for a fie d 

aSL^ ^ Ma inc,uding foward and ^^p^^u^rT^ ,op and ^ 

Although the invention has been described in mn^" T re 9 u,red - as well as for frame coded MBs 
scope of the invention as set forth in the claims. Y 6 theret0 w,thout departing from the spirit and 
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ing top and bottom fields, in a sequence of digital video images, comprising the steps of: 

determining a past field coded reference image having top and bottom fields, and a future field coded reference 
image having top and bottom fields; 

5 wherein the future image is predicted using the past image such that MV top , a forward motion vector of 

the top field of the future image, references one of the top and bottom fields of the past image, and MV^,, a 
forward motion vector of the bottom field of the future image, references one of the top and bottom fields of said 
past image; and determining forward and backward motion vectors for predicting at least one of the top and 
bottom fields of the current image by scaling the forward motion vector of the corresponding field of the future 

10 image. 

2. The method of claim 1 1 wherein: 

MV f top , the forward motion vector for predicting the top field of the current image is determined according to 
15 the expression (MV top *TFt B top )/TR D top + M V D ; 

where TR B top corresponds to a temporal spacing between the top field of the current image and the 
field of the past image which is referenced by MV top , TR Dtop corresponds to a temporal spacing between the 
top field of the future image and the field of the past image which is referenced by MV top , and MV D is a delta 
motion vector. 



20 



3. The method of claim 2, wherein: 



MV, t0 p is determined using integer division with truncation toward zero; and 
MV top and MVbot are integer half -I u ma pel motion vectors. 

25 

4. The method of claim 2 or 3, wherein: 

TR B,top and T R D,top incorporate a temporal correction which accounts for whether said current field coded 
image is top field first or bottom field first. 

30 

5. The method of one of the preceding claims, wherein: 

MV (fbot , the forward motion vector for predicting the bottom field of the current image is determined according 
to the expression (MV bot *TR Bbot )/TR DboX + MV 0 ; 
35 where TR B <bot corresponds to a temporal spacing between the bottom field of the current image and the 

field of the past image which is referenced by MV^. TR Dbot corresponds to a temporal spacing between the 
bottom field of the future image and the field of the past image which is referenced by MV bot , and MV D is a delta 
motion vector. 

40 6. The method of claim 5, wherein: 

MV, is determined using integer division with truncation toward zero; and 
MV top and MV bot are integer half-luma pel motion vectors. 

45 7. The method of claim 5 or 6, wherein: 

TR Bbot and TR Dbot incorporate a temporal correction which accounts for whether said current field coded 
image is top field first or bottom field first. 

so 8. The method of one of the preceding claims, wherein: 

MV b t0 p. the backward motion vector for predicting the top field of the current image is determined according to 
one of the equations 

55 (a)MV btop =({TR Bl0 p-TR Dilop )*MV top )/TR Dtop 

and 
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(b)MV b , op = MV, top -MV top: 

field ofT^u^ g ^T^Z^Z^ b6tWeen t0p ,ie,d 0f the «"™ image and the 

top field of fte J2 fcSSJ ^« ST^Jf^SS; D,op ^TP 0 "* 10 3 temp ° ral Spadn9 between «» 
. ulu,c Ulld g e ana tne field of the past image which s referenrpH hv m\/ ^v/ 

forward motion vector for predicting the top field offte current image by MV '°e- and MV Uop « the 

9. The method of claim 8, wherein: 

said equation (a) is selected when a de«a motion vector M V D=0 . and said equation (b> is selected when MV D 

1 0. The method of one of the preceding claims, wherein: 

Zt^nT^lT" V6Ct0r ^ ^ f i6W ° f the CU ™ ^ * ^ermined accord- 

(a) MV bbot =((TR Bibo ,.TR 0lbol )-Mv bot y TRoibo( 

and 



(b)MV bbot = MV fbot -MV 



bot* 



30 



bottom field of the future imaqe ^^S^JS^^* Responds to a temporal spacing between the 
fcn^rdmo^vec^^^^ 

1 1 . The method of claim 10, wherein: 



35 



40 



«*. ^mon ,., i5 ««w „,,„ « deHa mollon „, MVorf) M M ^ fc ^ ^ 

determining a backward sum of absolute differences error SAD. f„r«,»~ - 

to a Wure reference n^cr^.^ 

determining an average sum of absolute differences error -SAn 

selecting said coding mode according to the minimum of said SADs. 

13. The method of claim 12. comprising the further step of: 

selecting said coding mode according to the minimum of respective sums of said ^ ^ 

bias terms which account for the number o, required motion ^To^r^T^ n^^ " 

14. The method of claim 12 or 13, wherein: 
SAD fr 



45 



55 



bAD forward,fiew 'S determined according to a sum of- fa) a <;um nf ahcni,^ w« 

current macroblock relative to a too field of thi «S !2 differences for the top field of the 

15. The method of one of claims 12 to 14, wherein: 



14 



EP 0 863 674 A2 



SAD backward ,f jeid is determined according to a sum of: (a) a sum of absolute differences for the top field of the 
current macrobiock relative to a top field of the future reference macroblock, and (b) a sum of absolute differ- 
ences for the bottom field of the current macroblock relative to a bottom field of the future reference macrob- 
lock. 

5 

16. The method of one of claims 12 to 15. wherein: 

field ' s determined according to a sum of: (a) a sum of absolute differences for the top field of the 
current macroblock relative to an average of the top fields of the past and future reference macroblocks. and 
w (b) a sum of absolute differences for the bottom field of the current macroblock relative to an average of the 

bottom fields of the past and future reference macroblocks. 

1 7. A decoder for recovering a current, direct mode, field coded macroblodk having top and bottom fields in a sequence 
of digital video macroblocks from a received bitstream, wherein said current macroblock is bi-directionally predicted 

is using a past field coded reference macroblock having top and bottom fields, and a future field coded reference mac- 
roblock having top and bottom fields, comprising: 

means for recovering MV^, a forward motion vector of the top field of the future macroblock which references 
one of the top and bottom fields of the past macroblock, and MV^,, a forward motion vector of the bottom field 
20 of the future macroblock which references one of the top and bottom fields of said past macroblock; and 

means for determining forward and backward motion vectors for predicting at least one of the top and bottom 
fields of the current macroblock by scaling the forward motion vector of the corresponding field of the future 
macroblock. 

25 18. The decoder of claim 1 7, further comprising: 

means for determining MV f top , the forward motion vector for predicting the top field of the current macroblock, 
according to the expression (MV top *TR Bxop )fTF< D top + MV D ; 

where TR B top corresponds to a temporal spacing between the top field of the current macroblock and 
30 the field of the past macroblock which is referenced by MV topt TR D lop corresponds to a temporal spacing 

between the top field of the future macroblock and the field of the past macroblock which is referenced by 
MV top , and MVp is a delta motion vector. 

19. The decoder of claim 18, wherein: 

35 

MV f top is determined using integer division with truncation toward zero; and 
MV top and Mv bot are integer harf-luma pel motion vectors. 

20. The decoder of claim 18 or 19, wherein: 

40 

TR B 

top arid TR[j top incorporate a temporal correction which accounts for whether said current field coded 
image is top field first or bottom field first. 

21 . The decoder of one of claims 1 7 to 20, further comprising: 

45 

means for determining MV f (bot , the forward motion vector for predicting the bottom field of the current macrob- 
lock, according to the expression (MV bot *TR B <bot )/TR D bot + MV D ; 

where TR Bbot corresponds to a temporal spacing between the bottom field of the current macroblock 
and the field of the past macroblock which is referenced by MV^,, TR D bot corresponds to a temporal spacing 
so between the bottom field of the future macroblock and the field of the past macroblock which is referenced by 

MV^, and M V D is a delta motion vector. 

22. The decoder of claim 21, wherein: 

55 MVf ^ is determined using integer division with truncation toward zero; and 

MV top and MV^, are integer half-luma pel motion vectors. 

23. The decoder of claim 21 or 22, wherein: 
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Ssss'S J^sZ'zz^z:^ correction which accounts ,or ^ - m f iew ««« 

24. The decoder of one of claims 1 7 to 23. further comprising: 
and 

(b)MV btop = MV fIop -MV top ; . 

«- i!H[ he /i TRB Iop C ° rreSponds to a tem P°ral spacing between the top field of the current macroblock and 
the field of the past macroblock which is referenced hv mv to f current macroblock and 

between th* ton « 0 w «♦ ♦hT*T referenced by MV topt TR D , corresponds to a temporal spacing 

MV ? s thi ? f ^ Ure ? iaCr0b,0Ck and ,ield of P ast macroblock which isTeferenSd bj 

MV top . and MV f top , s the forward mot.on vector for predicting the top field of the current macroblock. 

25. The decoder of claim 24. further comprising: 

means for selecting said equation (a) when a delta motion vector MV D =rj- and 
means for selecting said equation (b) when MV D * 0. 

26. The decoder of one of claims 1 7 to 24. further comprising: 

(a) MV bbot = (CTR^-TRo^J-MV^yrR^ 

and 

( b )MV bbol = MV (bc , -MV^; 

and the™^^ » « - *. current macroblock 

MV bott and MV fibot is the forward moton vector for predicting the bottom field of the current macrobtocfc 
27. The decoder of claim 26, further comprising: 

means for selecting said equation (a) when a delta motion vector MV D =0 and 
means for selecting said equation (b) when MV D * 0. 
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